best arm
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
An Optimal Elimination Algorithm for Learning a Best Arm
We consider the classic problem of $(\epsilon,\delta)$-\texttt{PAC} learning a best arm where the goal is to identify with confidence $1-\delta$ an arm whose mean is an $\epsilon$-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst case sample complexity is not well understood. In this paper we propose a new approach for $(\epsilon,\delta)$-\texttt{PAC} learning a best arm. This approach leads to an algorithm whose sample complexity converges to \emph{exactly} the optimal sample complexity of $(\epsilon,\delta)$-learning the mean of $n$ arms separately and we complement this result with a conditional matching lower bound.
On Regret with Multiple Best Arms
We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and make no assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem, i.e., the number of best arms. Our setting captures many modern applications of bandit algorithms where the action space is enormous and the information about the underlying instance/structure is unavailable. We first propose an adaptive algorithm that is agnostic to the hardness level and theoretically derive its regret bound. We then prove a lower bound for our problem setting, which indicates: (1) no algorithm can be minimax optimal simultaneously over all hardness levels; and (2) our algorithm achieves a rate function that is Pareto optimal. With additional knowledge of the expected reward of the best arm, we propose another adaptive algorithm that is minimax optimal, up to polylog factors, over all hardness levels. Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art.
Verification Based Solution for Structured MAB Problems
We consider the problem of finding the best arm in a stochastic Mutli-armed Bandit (MAB) game and propose a general framework based on verification that applies to multiple well-motivated generalizations of the classic MAB problem. In these generalizations, additional structure is known in advance, causing the task of verifying the optimality of a candidate to be easier than discovering the best arm. Our results are focused on the scenario where the failure probability $\delta$ must be very low; we essentially show that in this high confidence regime, identifying the best arm is as easy as the task of verification. We demonstrate the effectiveness of our framework by applying it, and improving the state-of-the art results in the problems of: Linear bandits, Dueling bandits with the Condorcet assumption, Copeland dueling bandits, Unimodal bandits and Graphical bandits.
- North America > United States > New York (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (0.68)
- Information Technology > Data Science > Data Mining > Big Data (0.47)
- Information Technology > Communications > Social Media > Crowdsourcing (0.42)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (7 more...)
- Asia > Singapore (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)